I have shown under what conditions the previous strategies can be supported as a NE, but this does not ensure they are Subgame Perfect. To be a SPE the strategy must (necessary and sufficient) satisfy the one-deviation property, which states that no player can increase their payoff by changing action for 1 period given any history. Given this definition, not all of the above strategies are SPE. The strategies that are both NE and SPE are AD (for any \(\delta\)), Grim trigger (for \(\delta\ge\delta_{Grim}\)), WSLS (for \(\delta\ge\delta_{WSLS}\)).
TFT
For TFT, there are 4 possible histories (C, C), (C, D), (D, C) or (D, D). Assume player 2 is playing TFT.
After (C, C), sticking to TFT gives \(\frac{R}{1-\delta}\), while deviating to D for one period will get T, but then player 1 goes back to C at the same time as player 2 reacts and defects, so player 1 gets S. They then get in a loop of (D, C) then (C, D) and so on, thus the payoff is:
\[T+\delta S+\delta^2T+\delta^3S+\cdots=\frac{T+\delta S}{1-\delta^2}
\]
For SPE we therefore need:
\[\frac{R}{1-\delta}\ge\frac{T+\delta S}{1-\delta^2}
\]
\[\delta\ge\frac{T-R}{R-S}
\]
After (C, D), sticking to the TFT will see (D, C) then (C, D) and so on, for a payoff of \(\frac{T+\delta S}{1-\delta^2}\). If they deviate for only one period (playing C), the outcome will be (C, C) which means that TFT leads to (C, C) forever after. Thus, the payoff of deviation is \(\frac{R}{1-\delta}\). Therefore, we need:
\[\delta\le\frac{T-R}{R-S}
\]
After (D, C) sticking to TFT again sees the loop but getting S before T, so the payoff is \(\frac{\delta T+S}{1-\delta^2}\). Deviating for one period will mean playing D still, and player 2 will have switched to D, so the outcome is (D, D). TFT states that they will both punish each other by playing D, so they play D forever giving the payoff of deviating of \(\frac{P}{1-\delta}\). For SPE we need:
\[\frac{\delta T+S}{1-\delta^2}\geq\frac{P}{1-\delta}
\] \[\delta\ge\frac{P-S}{T-P}
\]
Finally, after (D, D), the TFT path is D forever giving the payoff of \(\frac{P}{1-\delta}\). Deviating for one period will give S for a period, but then they get into the (D, C) to (C, D) loop again giving a payoff of \(\frac{\delta T+S}{1-\delta^2}\). For SPE we need:
\[\frac{P}{1-\delta}\geq\frac{\delta T+S}{1-\delta^2}
\] \[\delta\le\frac{P-S}{T-P}
\]
For TFT to be a SPE, all 4 of these conditions must be satisfied, which implies a \(\delta\) which satisfies the equation below:
\[\delta_{TFT-SPE}=\frac{T-R}{R-S}=\frac{P-S}{T-P}
\]
Limited Punishment
The solutions to this generally are difficult to find analytically, as the functional form of depends on the number of periods of punishment. The NE condition is below, and highlights this issue
\[\sum_{s=1}^{k}\delta^s\geq\frac{T-R}{R-P}
\]
If \(k=1\) (TFT) then the function is linear, if \(k=2\) the function is quadratic, and so on. The NE requires one to solve a polynomial of order \(k\), which is easy when k is small, but increasing difficult, and may lead to multiple solutions as k increases. SPE has a similar issue. I have already found the condition for which TFT is a SPE, which covers \(k=1\). Therefore, let \(k=2\), and consider the same 4 histories plus a couple more that depend on how many rounds of punishment have passed.
After (C, C) the on-path outcomes will be C forever giving a payoff of \(\frac{R}{1-\delta}\). If one deviates for a period this will gain T, but they will be punished to 2 periods after. In the first period of punishment, they get S, but they will the reciprocate the punishment, hence the outcome in the 2nd punishment period is (D, D), getting P. This resets the punishment from the opponent also, thus they will continually punish each other forever. Thus, the payoff of the deviation is \(T+\delta S+\frac{\delta^2P}{1-\delta}\). For the SPE we then need:
\[\frac{R}{1-\delta}\geq T+\delta S+\frac{\delta^2P}{1-\delta}
\] \[\delta\left(R-S\right)-\delta^2\left(P-S\right)\geq\left(1-\delta\right)\left(T-R\right)
\] \[\left(P-S\right)\delta^2-\left(T-S\right)\delta+\left(T-R\right)\le0
\]
This quadratic can be solved via the quadratic equation but has no simplified solution past this point.
If the opponent is playing D, this means that they are punishing the player, and this can be either the first round or second round. Thus, there are 2 histories here where the opponent plays D: if this is the first round then the history of the previous 2 rounds is either {(D, C), (C, D)}, {(D, C), (D, D)} or {(D, D), (D, D)}; if this is the second round of punishment then the history is {(C, D), (D, D)}.
After {(D, C), (C, D)}, staying on strategy means the next outcome will be (D, D), and they will punish each other forever. The payoff is \(\frac{P}{1-\delta}\). Deviation for one period means the next outcome will be (C, D) and payoff is S. Going back to the strategy means the next outcome will be (D, C) for a payoff of T. The next period both players will be punishing each other, but lagged a period, thus we get (D, D) forever. Thus, the payoff of deviation is \(S+\delta T+\frac{\delta^2P}{1-\delta}\). Therefore, for SPE we need:
\[\frac{P}{1-\delta}\geq S+\delta T+\frac{\delta^2P}{1-\delta}
\] \[\left(T-P\right)\delta^2-\left(T-S\right)\delta+P-S\geq0
\]
This can be solved and simplified:
\[\delta=\frac{T-S\pm\sqrt{\left(T-S\right)^2-4\left(T-P\right)\left(P-S\right)}}{2\left(T-P\right)}
\] \[\delta=\frac{T-S\pm\left(T+S-2P\right)}{2\left(T-P\right)}
\]
Therefore, we have either \(\delta\le\frac{P-S}{T-P}\) or \(\delta\geq1\). The second condition is impossible as \(\delta<1\), thus for SPE it is necessary that:
\[\delta\le\frac{P-S}{T-P}
\]
After {(D, C), (D, D)} the strategy states that the next outcome will be (D, D) again, thus they get P forever. Deviating for a period will lead to (C, D) for a payoff of S. The outcome after this will be (D, C), then (D, D) after that. This is the same as the previous case, so the condition is the same. The same can be said after history {(D, D), (D, D)}, as the strategy will lead to (D, D) forever and deviating will lead to (C, D), (D, C), then (D, D) forever also.
After {(C, D), (D, D)} the next on-strategy outcome will be (D, D) again as the opponent will start a new 2-round punishment. The deviation outcome will be (C, D) for a payoff of S, and the outcome after will be (D, D) forever after. This is strictly worse for any \(\delta\) as \(S<P\).
The last history to check is the only other outcome where the opponent plays C. The previous (C, C) history requires both players to have always played C, but the outcome (D, C) does not. It does however require the player to have played C previously. The opponent could have played either C or D. Therefore, we need to check both {(C, D), (D, C)} and {(C, C), (D, C)}. The former has been explored somewhat already. After {(C, D), (D, C)} the next on-strategy outcome will be (D, D), as the player punishes for a second period and the opponent reciprocally punishes. This leads to P forever. Deviating for one period will lead to (C, D), then (D, D) forever which is strictly worse.
After {(C, C), (D, C)} the next on-strategy outcome will be (D, D) as the player is punished. This will be reciprocated, and the outcomes will be (D, D) forever. If they deviate the outcome will be (C, D), then (D, D) forever. The deviation is strictly worse again. (This I was unsure about).
In summary, there are only 2 conditions that must be satisfied for \(k=2\) to satisfy a SPE:
\[\delta\le\frac{P-S}{T-P}
\] \[\left(P-S\right)\delta^2-\left(T-S\right)\delta+\left(T-R\right)\le0
\]
For the payoffs in the paper’s games:
\[\delta\le\frac{13}{25}=0.52
\] \[13\delta^2-38\delta+\left(50-R\right)\le0
\] \[\Rightarrow\frac{38-\sqrt{1444-52\ast\left(50-R\right)}}{26}\le\delta\le\frac{38+\sqrt{1444-52\ast\left(50-R\right)}}{26}
\] \[R=32\Rightarrow0.59\le\delta\le2.33
\] \[R=48\Rightarrow0.054\le\delta\le2.87
\]
For \(R=32\), there is never an SPE, but for \(R=48\) the lower bound is very low, so the SPE is possible.
The lower bound of \(\delta\) is determined by:
\[\delta\geq\frac{T-S-\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{2\left(P-S\right)}
\]
It follows that it is necessary that:
\[\frac{T-S-\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{2\left(P-S\right)}\le\frac{P-S}{T-P}
\] \[\frac{\left(T-S\right)\left(T-P\right)-2\left(P-S\right)^2}{\left(T-P\right)}\le\frac{\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{1}
\] \[(T-S)-\frac{2(P-S)^2}{(T-P)}\le\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)
\] \[(T-S)^2-\frac{4(P-S)^4}{(T-P)^2}-4\frac{(T-S)(P-S)^2}{(T-P)}\le\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)
\] \[\frac{(P-S)^4}{(T-P)^2}+\frac{(T-S)(P-S)^2}{(T-P)}-(P-S)(T-R)\ge0
\]
This is pretty intractable at this point, which leads me to believe that higher values of k will be especially intractable and difficult to decipher any SPE from.
WSLS
Win-Stay Lose-Switch has the same possible histories as TFT, but differing actions thereafter. From (C, C), the on-strategy action to play C forever with a payoff of \(\frac{R}{1-\delta}\). If one deviates for one period (Plays D), they will gain T, which is a win, thus they stay with D. The other player lost, so switches to D, so the outcome is (D, D) with a payoff of \(\delta P\). They both consider this a loss and both switch to C forever with a payoff of \(\frac{\delta^2R}{1-\delta}\). Thus, the payoff of defection for a period is \(T+\delta P+\frac{\delta^2R}{1-\delta}\). Therefore, for SPE we need:
\[\frac{R}{1-\delta}\geq T+\delta P+\frac{\delta^2R}{1-\delta}
\] \[\delta\geq\frac{T-R}{R-P}
\]
After (C, D), WSLS states that one would switch and the opponent stays, so the outcome is (D, D) and payoff is P. Then both switch to C forever and get \(\frac{\delta R}{1-\delta}\). If one deviates then they don’t initially switch, the outcome is still (C, D) and they get S. The payoff of deviating for a period is then \(S+\delta P+\frac{\delta^2R}{1-\delta}\). Therefore, for SPE we need:
\[P+\frac{\delta R}{1-\delta}\geq S+\delta P+\frac{\delta^2R}{1-\delta}
\] \[\delta\geq-\frac{P-S}{R-P}
\]
This is always satisfied as \(P>S\).
After (D, C), staying on WSLS means staying with D, but one’s opponent will switch, thus the outcome will be (D, D). Then they both switch to C forever, so the payoff will be \(P+\frac{\delta R}{1-\delta}\). Deviating for a period means switching also, so the outcome is (C, D), the following steps were explained prior, so the payoff will be \(S+\delta P+\frac{\delta^2R}{1-\delta}\). The SPE condition is the same as before, so will be satisfied also.
Finally, after (D, D) if both follow the strategy they will play (C, C) forever, giving the payoff of \(\frac{R}{1-\delta}\). If one deviates for a period, the outcome will be (D, C) giving T. In the following period the opponent will switch to D, they payoff will be \(\delta P\) and the payoff after they switch to C will be \(\frac{\delta^2R}{1-\delta}\). This is the same SPE condition as (C, C), so will be satisfied if that is.
Thus, for WSLS to be a SPE we only need \(\delta\geq\frac{T-R}{R-P}\). This is the same condition as the NE condition, therefore is WSLS is a NE, it is also a SPE.